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Abstract 

We introduce two Python frameworks to train neural networks on large datasets: Blocks and 
Fuel. Blocks is based on Th eano, a linear algebra compiler with CUDA-support ( Bastien et all 
l2012l : lBergstra et all 201Clll . It facilitates the training of complex neural network models by 
providing parametrized Theano operations, attaching metadata to Theano’s symbolic com¬ 
putational graph, and providing an extensive set of utilities to assist training the networks, 
e.g. training algorithms, logging, monitoring, visualization, and serialization. Fuel provides 
a standard format for machine learning datasets. It allows the user to easily iterate over 
large datasets, performing many types of pre-processing on the fly. 

Keywords: Neural networks, GPGPU, large-scale machine learning 


1. Introduction 

Blocks and Fuel are being developed by the Montreal Institute of Learning Algorithms 
(MILA) at the University of Montreal. Their focus lies on quick prototyping of complex 
neural network models. The intended target audience is researchers who design and exper¬ 
iment machine learning algorithms, especially deep learning algorithms. 

Several other libraries built on top of Theano exist, including Pylearn2 and GroundHog 
(also developed by MILA), Lasagne, and Keras. Like its MILA-developed predecessors, 
Blocks maintains a focus on research and rapid prototyping. Blocks differentiates itself most 
notably from the above mentioned toolkits in its unique relationship with Theano. Instead 
of introducing new abstract objects representing ‘models’ or ‘layers’. Blocks annotates the 
Theano computational graph, maintaining the flexibility of Theano while making large 
models manageable. 

Data processing is an integral part of training neural networks, which is not addressed 
by many of the aforementioned frameworks. Fuel aims to fill this gap. It provides tools to 
download datasets and iterate/preprocess them efficiently. 
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Both Blocks and Fuel were developed from the very beginning with a strong focus on 
software engineering best practices. The development teams strive for high test coverage, 
thorough documentation and carefully considered APIs. 


2. Blocks 

Blocks comprises several components, which can be used independently from each other. 


2.1 Bricks 


Theano is a popular cho i ce for the implementation of neural networks (see e.g. lGoodfellow et al 


( 2013bl i: Pascanu et al. ( 2013I B. Blocks and many other libraries, such as Pylearn2 ( Goodfellow et al 
2013al i. build on Theano by providing reusable components that are common in neural 


networks, such as linear transformations followed by non-linear activations, or more com¬ 
plicated components such as LSTM units. In Blocks these components are referred to as 
bricks or “parametrized Theano operations”. 

Bricks consist of a set of Theano shared variables, for example the weight matrix of a 
linear transformation or the filters of a convolutional layer. Bricks use these parameters to 
transform symbolic Theano variables. 

Bricks can contain other bricks within them. This introduces a hierarchy on top of the 
flat computational graph defined by Theano, which makes it easier to address and configure 
complex models programmatically. 

The parameters of bricks can be initialized using a variety of schemes that are popular 
in the neural network literature, such as sparse initialization, orthogonal initialization for 
recurrent weights, etc. 

Blocks comes with a large number of ‘bricks’. Besides standard activations and transfor¬ 
mations used in feedforward networks (maxout, convolutional layers, table lookups) these 
also include a variety of more advanced recurrent neural network components like LSTM, 
GRU, and support for attention mechanisms (for an overview of different kinds of net 


work architectures, regularization methods, and optimization algorithms see iBengio et al 

(S)). 


2.2 Graph management 

Large neural networks can often result in Theano computational graphs containing hundreds 
of variables and operations. Blocks does not attempt to abstract away this complex graph, 
but to make it manageable by annotating variables in the graph. Each input, output, and 
parameter of a brick is annotated as such. Variables can also be annotated with the role 
they play in a model, such as weights, biases, filters, etc. 

A series of convenience tools were written that allow users to filter the symbolic compu¬ 
tational graph based on these annotations, and apply transformations to the graph. Many 
regularization methods such as weight decay, weight noise, or dropout can be implemented 
in a generic, model-agnostic way. Furthermore a complex query mechanism allows for their 
fine-grained application such as “apply weight noise to all weights that belong to an LSTM 
unit whose parent is a brick with the name /oo”. 
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2.3 Training algorithms 

The gradient descent training algorithm in Blocks is composed of different ‘step rules’ 
that modify the descent direction (learning rate scaling, momentum, gradient clipping, 
weight norm clipping, etc.). A variety of algorithms such as AdaGrad, ADADELTA, Adam, 
RMSProp are available as step rules. 

2.4 Training 

Experiment management is performed using a ‘main loop’, which combines a Theano graph 
with a training algorithm and a Fuel data stream. The main loop has a flexible extension 
interface, which is used to perform tasks such as monitoring on a validation set, serialization, 
learning rate scheduling, plotting, printing and saving logs, etc. 


3. Fuel 

Fuel’s goal is to provide a common interface to a variety of data formats and published 
datasets such as MNIST, CIFAR-10, ImageNet, etc. while making it easy for users to write 
an interface to new datasets. 

Blocks relies on Fuel for its data interface, but Fuel can easily be used by other machine 
learning frameworks that interface with datasets. 

3.1 Iteration and preprocessing pipeline 

Fuel allows for different ways of iterating over these datasets, such as sequential or shuf¬ 
fled minibatches, support for in-memory and out-of-core datasets, and resampling (cross 
validation, bootstrapping). 

It also provides a variety of on-the-fly preprocessing methods such as random crop¬ 
ping of images, creating n-grams from text hies, and the ability to implement many other 
methods easily. These preprocessing steps can be chained together to form more complex 
transformations of the input data. 

To sidestep Python’s global interpreter lock (GIL) and ensure optimal performance. 
Fuel can perform all operations in a separate process, transferring the processed data to the 
training process using TGP sockets. 


3.2 Standardized data format 


Datasets are distributed in a wide range of formats. Fuel simplihes dataset storag e by 
converting all built-in datasets to annotated HDF5 hies ( The HDF Group . 1997-2015 ). In 
addition to being an efficient format for large datasets that don’t ht into memory, HDF5 
is easy to organize and document. All of the data is stored in a single HDF5 hie, with the 
following metadata attached: 


• What are the data sources available (e.g. features, targets, etc.)? 

• How are these data sources officially split (e.g. training, validation, and test sets)? 

• Are some data sources unavailable for some splits (e.g. test set only offers unlabeled 
examples)? 
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• What are the axes semantics for a given data source (e.g. batch, feature, width, height, 
channel, time, etc.)? 

Integrating user data into Fuel via HDF5 is straightforward, and simply requires the 
data to be written to an HDF5 file with metadata according to the specifications. Finally, 
while standardizing by convention on HDF5, the Fuel dataset API is independent of it; 
users are free to implement dataset objects employing other backends and use them with 
the rest of Fuel’s components. 

3.3 Automated data management 

Fuel offers built-in scripts that automate the task of downloading datasets, (similar to e.g. 
SKDATA0) and converting them to Fuel’s HDF5 specification. 

The fuel—download script is used to download raw data files. Downloading the raw 
MNIST data files is as easy as typing fuel—download mnist. The fuel—convert script is 
used to convert raw data files into HDF5-format. 

Reproducibility being an important feature of both Fuel and Blocks, the fuel—convert 
script automatically tags all files it creates with relevant module and interface versions and 
the exact command that was used to generate these files. Inspection of this metadata is 
done with the fuel—info script. 

4. Serialization and checkpointing 

The training of large, deep neural networks can often take days or even weeks. Hence, regular 
checkpointing of training progress is important. Blocks aims to make the resumption of 
experiments entirely transparent, even across platforms, while ensuring the reproducibility 
of these experiments. 

This goal is complicated by shortcomings in Python’s Pickle serialization module, 
which is unable to serialize many iterators, which Fuel heavily depends on in order to 
iterate over large datasets efficiently. To circumvent this we reimplemented the itertools 
module from the Python standard library to be serializabl^. 

As a result. Blocks experiments are able to be interrupted in the middle of a pass over 
the dataset, serialized, and resumed later, without affecting the final training results. 


5. Documentation and community 


Blocks and Fuel are well documented, with both API documentation and tutorials available 
online. Two active mailing list^ support users of the libraries. A separate repositorj0 is 
maintained for users to contribute non-trivial exa mples of the use o f Block s. Implementa¬ 


tions of neural machine translation r nodels (NMT , Bahdanau et al.l (120151 )1 and the Deep 


Recurrent Attentive Writer (DRAW, Gregor et al. ( 2015l ll model are publicly available ex¬ 
amples of state-of-the-art models succesfully implemented using Blocks. 


1. https://jaberg.github.io/skdata/ 

2. https://github.com/mila-udem/picklable-itertools 

3. https://groups.google.com/d/forum/blocks-users and https://groups.google.com/d/forum/fuel-users 

4. https://github.com/mila-udem/blocks-examples 
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